GBE 



In Silico lonomics Segregates Parasitic from Free-Living 
Eukaryotes 

Eva Greganova 1,2 , Michael Steinmann 3 , Pascal Maser 1,2 '*, and Niklaus Fankhauser 4 

1 Swiss Tropical and Public Health Institute, Basel, Switzerland 
2 University of Basel, Switzerland 

institute of Biochemistry and Molecular Medicine, University of Bern, Switzerland 
institute of Cell Biology, ETH Zurich, Zurich, Switzerland 
^Corresponding author: E-mail: pascal. maeser@unibas.ch. 
Accepted: September 6, 2013 

Abstract 

Ion transporters are fundamental to life. Due to their ancient origin and conservation in sequence, ion transporters are also particularly 
well suited for comparative genomics of distantly related species. Here, we perform genome-wide ion transporter profiling as a basis 
for comparative genomics of eukaryotes. From a given predicted proteome, we identify all bona fide ion channels, ion porters, and ion 
pumps. Concentrating on unicellular eukaryotes (n = 37), we demonstrate that clustering of species according to their repertoire of 
ion transporters segregates obligate endoparasites (n = 23) on the one hand, from free-living species and facultative parasites 
(n = 1 4) on the other hand. This surprising finding indicates strong convergent evolution of the parasites regarding the acquisition 
and homeostasis of inorganic ions. Random forest classification identifies transporters of ammonia, plus transporters of iron and other 
transition metals, as the most informative for distinguishing the obligate parasites. Thus, in silico ionomics further underscores the 
importance of iron in infection biology and suggests access to host sources of nitrogen and transition metals to be selective forces in 
the evolution of parasitism. This finding is in agreement with the phenomenon of iron withholding as a primordial antimicrobial 
strategy of infected mammals. 

Key words: parasite genomics, convergent evolution, ion homeostasis. 



Introduction 

Inorganic ions are essential to life. All cells maintain transmem- 
brane gradients of potassium (K + ), sodium (Na + ), calcium 
(Ca 2+ ), and chloride ions (CI"), and the resulting membrane 
potential allows electrical signal transduction and drives nutri- 
ent uptake. Other ions such as iron (Fe 2+ ), magnesium (Mg 2+ ), 
copper (Cu 2+ ), and zinc (Zn 2+ ) are important nutrients them- 
selves, functioning as cofactors for metalloproteins and stabi- 
lizers of large organic molecules. Polyatomic ions such as 
sulfate (SO4"), nitrate (NOJ), phosphate (PO4"), or ammo- 
nium (NHJ) may serve as inorganic sources of macronutrients. 
The majority of essential micronutrients are ions too, for 
example, cobalt (Co 2+ ), manganese (Mn 2+ ), iodine (l~), and 
molybdenum (Mo0 4 ~). As many of these ions are harmful at 
higher concentrations, ion homeostasis is fundamental for cell 
function. The importance of ion homeostasis is also illustrated 
by the large number of natural toxins that perturb it, by form- 
ing ion-conducting pores into phospholipid bilayers or by 
interfering with the function of ion transporters. 



The term "ionomics" was coined for high-throughput 
measurements of inorganic nutrients (the ionome) in cells 
and tissues, usually by atomic emission spectroscopy (Salt 
et al. 2008). lonomics quantifies the elemental composition 
and applied to the screening of reverse genetic mutants, it 
provided insights into the molecular mechanisms of ion 
homeostasis in Saccharomyces cerevisiae (Eide et al. 2005) 
and Arabidopsis thaliana (Baxter et al. 2007). lonomics com- 
bined with forward genetics was used to study natural varia- 
tion of ion concentrations in plants (Buescher et al. 2010) and 
mice (Fleet et al. 2011) and their potential association with 
metabolic disorders in humans (Sun et al. 2012). In particular, 
ionomics has further highlighted the role of ion transporters as 
the key players in ion homeostasis, lonomics enabling the 
identification and characterization of transporters (Rus et al. 
2006; Ciavardelli et al. 201 0; Lowry et al. 201 2), we reasoned 
that the genome-wide profiling of ion transporters would in 
turn permit to draw conclusions about the physiology of the 
ionomes of different organisms. 



© The Author(s) 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.0rg/licenses/by-nc/3.O/), which permits 
non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contactjournals.permissions@oup.com 



1 902 Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



Comparative Genomics of Ion Transporters 



GBE 



The TC system for Transporter Classification (Saier et al. 
2006, 2009) recognizes three different classes of ion transpor- 
ters: ion channels, ion porters (uniporters, symporters, and 
antiporters), and ion pumps (ATPases). Here, we construct 
profiles for all known families of ion transporters and use 
these profiles for "in silico ionomics," aiming to elucidate 
convergence as well as divergence in the evolution of the 
molecular mechanisms of ion homeostasis in eukaryotes. 

Materials and Methods 

Proteome Files and CEGMA Completeness 

Predicted proteomes were obtained from UniProt (www.uni 
prot.org, last accessed October 1 , 201 3) and Integr8 (ftp://ftp. 
ebi.ac.uk/pub/databases/integr8/, last accessed October 1, 
2013) and tested for completeness as follows. Profiles were 
downloaded from the CEGMA database (http://korflab.ucda- 
vis.edu/Datasets/cegma/, last accessed October 1, 2013) and 
the full set of 458 core eukaryotic proteins (Parra et al. 2007) 
was run with hmmscan of the HMMer 3.0 package (http:// 
hmmer.janelia.org, last accessed October 1, 2013) against a 
diverse set of eukaryote reference proteomes: Caenorhabditis 
elegans, Chlamydomonas reinhardtii, Dictyostelium discoi- 
deum, Drosophila melanogaster, Danio rerio, Encephalitozoon 
cuniculi, Entamoeba histolytica, Giardia lamblia, Homo sapi- 
ens, Kluyveromyces lactis, Leishmania major, Mus musculus, 
Plasmodium falciparum, S. cerevisiae, Schizosaccharomyces 
pombe, Trypanosoma brucei, T cruzi, Theileria parva, and 
Trichomonas vaginalis. The 100 best-scoring profiles returned 
hits of expectancy (£) values <10~ 50 against all the reference 
proteomes. These 100 profiles were then used to assess the 
completeness of additional proteomes {Ascarissuum, A. thali- 
ana, Brugia malayi, Candida albicans, Cryptosporidium homi- 
nis, C muris, C parvum, Cryptococcus neoformans, Coccmyxa 
subellipsoidea, Chlorella variabilis, Leishmania braziliensis, 
L. infantum, L mexicana, Magnaporthe grisea, Meloidogyne 
hapla, Micromonas pusilla, Ostreococcus lucimarinus, Oryza 
sativa, Plasmodium berghei, P. chabaudi, P. knowlesi, 
P. vivax, P. yoelii, Pediculus humanus, Polysphondylium 
pallidum, Paramecium tetraurelia, Schistosoma japonicum, 
Theileria annulata, Trypanosoma congolense, T vivax, 
Toxoplasma gondii, Trichinella spiralis, Tetrahymena thermo- 
phila), which were only included if they contained a hit for 
at least 99 of the 100 profiles at a cutoff of E< 10" 30 . 

Ion Transporter Reference Sets and Redundancy 
Reduction 

Lists of known families of channels (TC 1.A, n = 37), porters 
(TC 2.A, n = 96), and pumps (TC 3.A, n = 51) were obtained 
from TCDB (http://www.tcdb.org, last accessed October 1, 
2013), and all ion transporter families (n = 78) were sorted 
out by hand. In case of overlap with parental terms (super- 
families), only the nonredundant children (families) were kept 



(n = 75; supplementary table S1, Supplementary Material 
online). For each ion transporter family, all the amino acid 
sequences that had been annotated with the corresponding 
TCDB accession in the manually curated section of UniProt 
were retrieved. Each of these sequence sets was redun- 
dancy-reduced as follows. A Smith-Waterman local alignment 
(Smith and Waterman 1981) was performed for all pairs of 
sequences, and if the resulting score reached 75% or more of 
the score of the self-alignment of the shorter sequence (i.e., 
the maximally attainable alignment score), the shorter se- 
quence was removed from the set. 

Ion Transportome HMM Library 

Before converting the redundancy-reduced sets of ion trans- 
porter reference sequences into profiles, predicted ankyrin re- 
peats and cyclic nucleotide binding domains were removed 
from the sequences. These domains were identified with 
hmmsearch of the HMMer 3.0 package using the profiles 
Ank (PF00023) and cNMP_binding (PF00027) from Pfam 
(http://pfam.sanger.ac.uk/, last accessed October 1, 2013). 
The parts matching these profiles with £<10~ 8 were 
replaced with letters X in each sequence. Then, a ClustalW 
multiple alignment (Thompson et al. 1 994) was performed for 
each sequence set and converted into a position-dependent 
scoring matrix with hmmbuild. The resulting profiles were 
concatenated to a HMM library for ion transporters. 
Negative control libraries were constructed by randomly se- 
lecting sets of 61 entries from the Pfam-A database (version 
26.0: 1 3,672 entries). All the steps as outlined in figure 1 were 
carried out with self-made Perl scripts. 

Screening and Clustering of Proteomes 

The above profile libraries were used to screen predicted pro- 
teomes with hmmscan. When counting the number of hits 
per profile (fig. 2), every protein in a given proteome was 
allowed to score only once, that is, with the profile against 
which it had the highest score. A cutoff E-value of <10~ 10 
was used to call a hit. For clustering (fig. 4), a 65-tuple vector 
was constructed for each proteome which consisted of the 
respective best scores to each profile. Hierarchical clustering of 
these vectors was performed with the R library (R Core Team 
2013) Pvclust which implements multiscale bootstrap resam- 
pling (n= 10,000) to estimate "approximately unbiased" (au) 
errors, where P=(100-ai/)/100 (Suzuki and Shimodaira 
2006). Distance metric (Canberra) and clustering algorithm 
(McQuitty) were chosen as to maximize the number of species 
in significant clusters {au> 95). 

Random Forests 

Decision tree classification was performed using the 
randomForest (Liaw and Wiener 2002) package for the R sta- 
tistical language (R Core Team 2013), which implements an 
ensemble learning method developed by Breiman (2003). A 



Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



1903 



Greganova etal. 



GBE 



Ion transporter families from TCDB > Table S1 

•TC1.A Channels 35 Families 

• TC 2.A Porters 30 Families 

• TC 3.A Pumps 10 Families 



Reference sequences for each family 

• based on hand-cu rated evidence 

• redundancy reduce each set 

• purge promiscuous domains (Ank, CNB) 



HMM-profile library of ion transporters 

• multiple alignment of each set 

• position-dependent scoring matrices 

• concatenate matrices to a library 



In silico ionomics of eukaryote proteomes 

• compare the number of hits for each family > Figure 2, Table S2 

• compare the best score of each family > Figure 3, Table S3 

• clustering based on best score vectors > Figure 4 

Fig. 1. — Overview on the in silico approach for ionomics. 



A Percent of proteome 0 

i 

TM proteins Multicellular 

Unicellular 
Ion transporters Multicellular |-[]]| 

Unicellular [fl] 

B Hits per proteome 



10 



20 




Multicell || L 

Multicell 

free ... Unice . 

parasitic free Unicell. 

parasitic 



/ Channels 
Porters 
Pumps 



random forest consisting of 5,000 trees was used. At each 
node of the decision trees, the classification quality before and 
after splitting the set was quantified by using the Gini coeffi- 
cient as a measure increasing proportionally to higher inequal- 
ity of the predictions (i.e., parasite) in a set. 

Results 

A Profile Library for All Known Ion Transporters 

The TCDB transporter database contains more than 600 dif- 
ferent families of transmembrane channels, pores, and porters 
(Saier et al. 2006, 2009). We concentrated on transporters 
which solely have inorganic ions for substrates; ion-dependent 
porters of organic nutrients (e.g., the Na + -coupled glucose 
transporter) were excluded from the present analysis. A total 
of 75 nonredundant ion transporter families were identified, 
which subdivided as follows: 35 different kinds of ion chan- 
nels; 30 ion anti-, sym-, or uniporters; and 10 ATP-dependent 
ion pumps (supplementary table S1, Supplementary Material 
online). We then constructed for each ion transporter family a 
position-dependent scoring profile as outlined in figure 1. 
Reference protein sequences had been obtained from the 
manually curated section of the UniProt database (Magrane 
2011) and had been redundancy-reduced based on all pair- 
wise alignments (Smith and Waterman 1981). Redundancy 
reduction minimized bias of the set of reference sequences 
while preserving diversity. It turned out to be necessary to 
purge the reference sequences from cyclic nucleotide-binding 
sites (Shabb and Corbin 1992) and ankyrin repeats 



Fig. 2. — Predicted ion transporters in eukaryotes as percentage of the 
total proteome (A) or in absolute numbers (£). The data are in supplemen- 
tary table S2, Supplementary Material online. 

(Li et al. 2006) before making the profiles as otherwise, 
these promiscuous domains returned hundreds of false-posi- 
tive hits that bore no resemblance to ion channels. The resul- 
tant 75 profiles were concatenated to a HMM library (Eddy 
2009) for ion transporters which is available from the authors 
on request. 

Genome-Wide Prediction of Ion Transporters 

The ion transportome profile library was used to scan pre- 
dicted proteomes from fully sequenced genomes of the dif- 
ferent eukaryote kingdoms, that is, the opisthokonts (fungi, 
animals), archaeplastida (plants, algae), excavates (kinetoplas- 
tids, trichomonas, giardia), chromalveolates (apicomplexa, 
one ciliate), and amoebozoa (entamoeba, slime molds); rhi- 
zaria are still missing from the list of sequenced genomes. Only 
proteomes of a CEGMA completeness >99% were included 
(fig. 1). The aim was to identify all the different ion transpor- 
ters from a given species. The numbers of predicted ion trans- 
porters per proteome varied greatly between the different 
species, from more than 500 in C. elegans or A. thaliana to 
less than 30 in The. parva and E. cuniculi (supplementary table 
S2, Supplementary Material online). Plants and metazoa gen- 
erally possessed more ion transporters than unicellular eukary- 
otes (P< 0.0001, two-tailed Mann-Whitney U test). This held 
true also when the numbers of ion transporters were normal- 
ized by proteome size: Multicellular species devoted a 2-fold 
greater portion of their proteomes to ion transporters than 



1 904 Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



Comparative Genomics of Ion Transporters 



GBE 



unicellular organisms (2.14% vs. 1.06%; P< 0.0001, two- 
tailed Mann-Whitney U test). The fraction of multispanning 
transmembrane proteins, that is, proteins predicted by 
Phobius (Kail et al. 2004) to contain two or more transmem- 
brane domains, was the same in both groups (13.1% vs. 
12.8%; fig. 2A). Overall, the difference between multicellular 
and unicellular eukaryotes was most pronounced regarding 
the predicted numbers of ion channels (fig. IB). A striking 
exception was Par. tetraurelia which, with 443 predicted ion 
channels, possessed the largest number of different ion chan- 
nel subunits of all the analyzed eukaryotes. This is in agree- 
ment with previous reports (Haynes et al. 2003). Paramecium 
tetraurelia was followed by C. elegans (372 predicted ion 
channels) and Dan. rerio (337 predicted ion channels; supple- 
mentary table S2, Supplementary Material online). Comparing 
the predicted numbers of hits per proteome also indicated 
that free-living or facultative parasitic eukaryotes possess 
more individual ion transporters than obligate endoparasites 
(fig. 2B and supplementary table S2, Supplementary Material 
online). To further investigate this phenomenon, we concen- 
trated on unicellular eukaryotes only (to exclude the strong 



effect arising from the differences between multi- and unicel- 
lularity; fig. 2). 

Clustering Proteomes According to lonomic Landscape 

The numbers of predicted transporters per proteome being 
a somewhat crude and arbitrary measure, we used the 
achieved scores against the ion transporter profiles as a re- 
fined and unbiased parameter of a given proteome. Thus, 
an "ionomic landscape" vector was built for every prote- 
ome, consisting of the top scores against the ion transporter 
profiles of supplementary table S1, Supplementary Material 
online. Figure 3 depicts these vectors as a heat map where 
darker shades represent higher scores. The data are shown 
in supplementary table S3, Supplementary Material online 
(only 65 different ion transporter profiles were used as 10 
appeared to be prokaryote-specific and did not return a hit 
in any of the analyzed eukaryotes). Although the different 
unicellular eukaryotes analyzed achieved similar top scores 
toward the known families of ion pumps (fig. 3, right), the 
situation was different regarding ion channels and ion 



Ion channels 



Ion porters 



Ion pumps 




C. reinhardtii ^ 
C. subellipsoidea 

C. variabilis 

D. discoideum 
K. lactis 
M. pusilla 
O. lucimarinus 
P. pallidum 

P. tetraurelia 
S. cerevisiae 
S. pombe 
C. albicans 
C. neoformans 
M. grisea 
C. hominis 
C. muris 
C. parvum 

E. cuniculi 
E. histolytica 
G. lamblia 
L. braziliensis 
L. infantum 
L. major 
L. mexicana 
P. berghei 
P. chabaudi 
P. falciparum 
P. knowlesi 
P. vivax 
P. yoelii 
T. annulata 
T. brucei 
T. cruzi 
T. gondii 
T. parva 
T. vaginalis 
T. vivax 



> free-living 



facultative 
parasites 



J 



obligate 
parasites 

400+ | 
200 



0 l_l 



Fig. 3. — Ion transporter repertoires of unicellular eukaryotes. The heatmap represents the best HMMer scores achieved by the different proteomes 
(rows) against the profiles for the different families of ion transporters (columns). Profiles that did not return a hit of score >20 in any of the proteomes are 
not shown. The data are in supplementary table S3, Supplementary Material online. 



Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



1905 



Greganova et a 



GBE 



o 
5t 



LO 

CO 



o 

CO 



LO 
C\J 



o 



98 



CD 
-O 

E 
o 



67l 1 



99 

n 



7 iL 

n 



n 



100 



98 



I 



98 99 

n n 

96| « » 

ni 

1 cn 



>S ^ o 
o 



co co 

3 CC 

c 
o 
5 



66 



100 



1001 

1 



92 

n- 



03 ^ 



5 

03 

cj e5 



o 



90 



94 



a. 



90? 
o 



100 
72 



100|e2 h 



CD 

a. 



Uj 

72 

rpgg73 

7J?r 



Ip 

CtS .CO ° 

■S *§ Uj CD 
p o> 

.CO § 



■55 5 



- j i £ 
■o -J 



3 jg , 

CD 13 

-C 3 

a; °- a; ^ 



CX3 > 

■9- o 
45 



Fig. 4. — Hierarchical clustering of ionomic landscapes segregates obligate parasites from eukaryotes with free-living life stages. The tree was produced 
with pvclust using Canberra distance and McQuitty's similarity analysis, au are shown in gray, where P=(100-ai/)/100. 



porters, which appeared to be generally underrepresented 
in obligate endoparasites as compared with free-living spe- 
cies or facultative parasites (fig. 3, left and middle). Parasites 
such as Cryptosporidia, Microsporidia, or Theileria appeared 
to be devoid of bona fide cation channels. The ionomic 
landscape vectors were hierarchically clustered in an unbi- 
ased way: A selection of distance metrics and clustering 
algorithms were combined with the program pvclust, and 
the resulting trees were ranked based on the number of 
leaves in statistically significant clusters. The best scoring tree 
is shown in figure 4. Its topology deviates from a phyloge- 
netic tree in several aspects. The microsporidian E. cuniculi 
does not cluster with the fungi. The free-living amoebozoa 
D. discoideum and P. pallidum do, whereas Ent. histolytica 
groups with Tri. vaginalis. The ciliate Par. tetraurelia clusters 
with the free-living green algae rather than with alveolates 
(which are all parasitic), and the trypanosomatids are sister 
to Toxoplasma and the malaria parasites. Strikingly, the first 
and main division of the ionomic tree of unicellular eukary- 
otes is into eukaryotes with free-living life stages on one side 
and obligate endoparasites on the other. This separation 
was statistically significant as the probability of splitting 
the 37 analyzed species by chance into the 23 obligate 
parasites and 14 facultative parasites or free-living species 
equals (23! x 14! x 2)/37! =3.3 x 10" 10 In addition, we 
carried out the same procedure of representing and cluster- 
ing HMMer hits based on sets of 65 randomly chosen pro- 
files from the Pfam database of protein families (Punta et al. 



2011); a separation of parasites from free-living species 
never occurred (data not shown). 

Convergent Evolution of Parasites 

We concluded that the topology of the tree in figure 4 reflects 
convergent evolution between obligate endoparasitic eukary- 
otes, in particular loss of ion channel and ion porter genes. To 
elucidate which of the ion transporter families contribute the 
strongest signal for the observed distinction of the parasites, 
we performed a random forest classification after having as- 
signed to each species an attribute parasite or nonparasite, the 
latter also comprising facultative parasitic species. The same 
input vectors were used as for hierarchical clustering (fig. 4). 
The random forest method generated training and validation 
sets by random resampling of the input vectors. A total of 
5,000 trees were used to determine the impact of each ion 
transporter family on prediction accuracy regarding parasite 
status. Gini coefficients served as a measure for inequality 
(node impurity) of the predictions. Figure 5 depicts the 
impact of individual transporter families on the sum of all 
Gini coefficients in the forest. Five families stood out with a 
mean Gini decrease >1, namely the high-affinity ammonia 
transporters (AMT; 1. A. 11.2), the natural resistance-associ- 
ated macrophage proteins (NRAMP; 2.A.55), the BOR1-type 
boron transporters (2.A.31 .3), the ZIP family of zinc-iron per- 
meases (2. A. 5), and the heavy metal transporter of the ABC-B 
superfamily (HMT; 3.A.1.210). BOR1 transporters were miss- 
ing in all obligate endoparasites. ZIP and HMT transporters 



1 906 Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



Comparative Genomics of Ion Transporters 



GBE 



1.A.11.2 


AMT 








• NH 4 + 


2.A.55 


NRAMP 








• Fe 2+ Mn 2+ 


2.A.31.3 


BOR1 








• B0 3 3- 


2.A.5 


ZIP 








• Zn 2+ Fe 2+ 


3.A.1.210 


HMT 






o 


Fe 2+ Cd 2+ 


3. A. 1.202 


CFTR 




o 




ci- 


2.A.37.4 


CHX 


o 






K + 


1.A.35.5 


MRS2 


o 






Mg 2+ 


2.A.49 


CIC 


o 






ci- 


2.A.1.9 


PHS 


o 






po 4 3 - 


1.A.4 


TRP-CC 


o 






Ca 2+ 


2.A.1.8 


NNP 


o 






N0 3 - 


1.A.1.8 


TWIC 


o 






K + 


1.A.1.11 


Ca v 


o 






Ca 2+ 


2.A.53 


SulP 


o 






so 4 2 - 


2.A.67.2 


YSL 


o 






Fe 3+ 


2.A.37.1 


Kef 


o 






K + 


1.A.8.10 


o 










1. A. 1.20 


o 










2.A.1.14 


o 










1.A.38 


o 










1.A.8.11 


o 










3.A.1.15 


o 










1.A.1.2 


o 










1.A.1.7 


o 










1.A.1.19 


o 










1.A.23 


o 










2.A.38.3 


o 












I 

0.0 


I 

0.5 




I 

1.0 


I 

1.5 



Mean decrease of Gini coefficient 



Fig. 5. — Random forest analysis measuring the effect of each ion transporter family on the ability to distinguish obligate endoparasites. The typical 
substrates of the transporters are indicated on the right. 



were present but different, consistently returning lower scores 
against the respective profiles than the hits from nonparasitic 
eukaryotes. AMT transporters were absent except in T. cruzi, 
and NRAMP transporters only occurred in Plasmodium spp. 
and Tox. gondii (supplementary table S3, Supplementary 
Material online). Note that only the combined information 
from the various ion transporter families distinguished the 
parasites. 

Discussion 

Ion transporters are well suited for comparative genomics of 
evolutionary distant species as 1 ) most of the known ion trans- 
porter families are ancient as reflected by their ubiquitous oc- 
currence in bacteria, archaea, and eukaryotes (Ward et al. 
2009); and 2) many ion transporters possess conserved pore 
loops (MacKinnon 1995) that function as substrate selectivity 
filters and are readily detectable in silico. Furthermore, ion 
homeostasis is vital for all cells. Here, we describe a novel 
approach for comparative genomics and apply it to ion 
transporters as summarized in figure 1 . We have identified 



75 nonredundant families of ion transporters from the 
Transporter Classification Database (supplementary table S1, 
Supplementary Material online), 65 of which occur in eukary- 
otes (supplementary table S2, Supplementary Material online). 
Having constructed hidden Markov model-based profiles for 
each family, we scanned predicted high-quality proteomes 
of various eukaryotes and identified all the bona fide ion 
channels for each species. Although all types of eukaryotes 
devote around 12% of their proteome to multispanning trans- 
membrane proteins (fig. 2), there are marked differences in 
the fractions of ion transporters. One major division appeared 
to be between multicellular (i.e., animals and plants) and uni- 
cellular eukaryotes, the former devoting larger fractions of 
their proteomes to ion channels and ion porters. This can be 
explained by the expansion of gene families in multicellular 
but not in unicellular eukaryotes (Wolf and Koonin 2013), 
with the notable exception of Paramecium. Another, more 
interesting division across the eukaryotes appeared to be be- 
tween obligate parasites and nonparasitic or facultative para- 
sitic species, the obligate parasites possessing fewer individual 
ion transporters (fig. 2). This may be explained by selective 



Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



1907 



Greganova etal. 



GBE 



gene loss in the obligate parasites (Wolf and Koonin 201 3). To 
further investigate this phenomenon, we concentrated on uni- 
cellular eukaryotes, eliminating the dominant effect of multi- 
cellularity versus unicellularity (fig. 2). 

The analyzed unicellular endoparasites were highly hetero- 
geneous regarding their phylogeny (fungi, trypanosomatids, 
apicomplexa, amoebozoa, and excavates) as well as habitat of 
the life stages (intracellular in compartment, intracellular cyto- 
solic, extracellular). However, all the analyzed proteomes were 
similar in that they exhibited a markedly reduced diversity of 
ion channels and ion porters, but not ion pumps (fig. 3). There 
are likely to be unknown families of ion transporters that 
remain to be discovered, and it is conceivable that such fam- 
ilies are overrepresented in the parasites. In any case, using the 
diversity of the presently known families of ion transporters as 
a basis for hierarchical clustering, it was possible to segregate 
obligate endoparasites from free-living and facultative para- 
sitic eukaryotes (i.e., species with a free-living life stage). Such 
a separation between obligate endoparasitic and free-living 
species has not been observed in many related analyses per- 
formed based on predicted gene products other than ion 
channels, for example, enzymes of pyrimidine metabolism 
(Ali et al. 2013), vitamin B (Stoffel et al. 2006), or porphyrin 
synthesis (Godel et al. 2012). To our knowledge, the only 
contender category of proteins that would allow a similar dis- 
tinction is the enzymes of purine de novo synthesis, a pathway 
which appears to be missing in all obligate endoparasitic eu- 
karyotes (Hassan and Coombs 1988; de Koning et al. 2005). 
The picture presented by the ion transporters (fig. 3) is less 
clear-cut as no single protein is absent from all obligate endo- 
parasites while present in the free-living eukaryotes. 
Nevertheless, the present approach to in silico ionomics dem- 
onstrates that there is convergent evolution among unrelated 
parasites with respect to ion transporters (fig. 4). By applying 
random forest classification to the data set, we were able to 
demonstrate that the most distinctive features of the obligate 
parasites were the lack of predicted ammonia transporters 
and transporters of iron or other transition metals (fig. 5). 

All obligate endoparasites salvage nitrogen-containing 
nutrients from their hosts such as amino acids, purines, or 
pyrimidines. These provide them with an ample source of or- 
ganic nitrogen, rendering the transporters of inorganic nitro- 
gen redundant. Hence, the independent loss of ammonia 
transporters as a consequence of a metabolic streamlining in 
parasites — with the notable exception of T cruzi, which is the 
only obligate endoparasite among the unicellular eukaryotes 
that scores high against the ammonium transporter profile (TC 
1.A.11.2; supplementary table S3, Supplementary Material 
online). The observed loss of transporters of divalent cations 
might have a similar explanation as suggested earlier for the 
ammonium transporter, the parasites accessing iron from their 
hosts in other form than free Fe 2+ or Fe 3+ . African trypano- 
somes express unique, glycosylphosphatidylinositol-anchored 
receptors on their surface for mammalian transferrin, a 



glycoprotein that transports Fe 3+ in the blood. The heterodi- 
meric receptor is internalized by endocytosis upon binding of 
host transferrin (Taylor and Kelly 2010). Trichomonas, 
Toxoplasma, and Entamoeba also obtain iron from mamma- 
lian iron-binding proteins such as lactoferrin, transferrin, or 
ferritin (Lopez-Soto et al. 2009; Horvathova et al. 2012; 
Ortiz-Estrada et al. 2012). Malaria parasites probably cover 
their iron requirement from ferric heme, the oxidized end 
product of hemoglobin degradation. Although the majority 
of heme molecules polymerize to hemozoin in the food vac- 
uole, some escape to the parasite's cytosol where they may 
serve as an iron source after degradation (Ginsburg 1999). For 
many other parasites, the pathways of iron salvage remain 
unknown. However, the general importance of iron acquisi- 
tion for pathogens is illustrated by the phenomenon of iron 
withholding, a typical defensive response of mammals (Ganz 
2009; Weinberg 2009). The peptide hormone hepcidin func- 
tions as a master regulator of iron absorption and distribution 
in mammals, and its expression is critically determined by in- 
fection and inflammation (Drakesmith and Prentice 2012). 

We conclude that there is strong convergence among ob- 
ligate endoparasites in the loss of ion transporters (possibly 
combined with divergence in new strategies for ion uptake) 
and propose access to organic ammonia and iron to contrib- 
ute to the selective forces in the evolution of parasitism. 

Supplementary Material 

Supplementary tables S1-S3 are available at Genome Biology 
and Evolution online (http://www.gbe.oxfordjournals.org/). 

Acknowledgments 

This work was supported by the Swiss National Science 
Foundation (Sinergia grant number CRSII3_1 27300). 

Literature Cited 

Ali JA, et al. 2013. Pyrimidine salvage in Trypanosoma brucei bloodstream 

forms and the trypanocidal action of halogenated pyrimidines. Mol 

Pharmacol. 83:439^53. 
Baxter I, et al. 2007. Purdue ionomics information management system. 

An integrated functional genomics platform. Plant Physiol. 143: 

600-611. 

Breiman L. 2003. Manual for setting up, using, and understanding, 
Random Forest v4.0. [cited 2013 Oct 1]. Available from: 
http://www.stat.berkeley.edU/~breiman/Using_random_forests_v4.0. 
pdf. 

Buescher E, et al. 2010. Natural genetic variation in selected populations of 
Arabidopsis thaliana is associated with ionomic differences. PLoS One 
5:e11081. 

Ciavardelli D, et al. 2010. Phenotypic profile linked to inhibition of the 

major Zn influx system in Salmonella enterica: proteomics and 

ionomics investigations. Mol Biosyst. 7:608-619. 
de Koning HP, Bridges DJ, Burchmore RJ. 2005. Purine and pyrimidine 

transport in pathogenic protozoa: from biology to therapy. FEMS 

Microbiol Rev. 29:987-1020. 
Drakesmith H, Prentice AM. 2012. Hepcidin and the iron-infection axis. 

Science 338:768-772. 



1 908 Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



Comparative Genomics of Ion Transporters 



GBE 



Eddy SR. 2009. A new generation of homology search tools based on 
probabilistic inference. Genome Inform. 23:205-211. 

Eide DJ, et al. 2005. Characterization of the yeast ionome: a genome-wide 
analysis of nutrient mineral and trace element homeostasis in 
Saccharomyces cerevisiae. Genome Biol. 6:R77. 

Fleet JC, Replogle R, Salt DE. 201 1 . Systems genetics of mineral metabo- 
lism. J Nutr. 141:520-525. 

Ganz T. 2009. Iron in innate immunity: starve the invaders. Curr Opin 
Immunol. 21:63-67. 

Ginsburg H. 1999. Iron acquisition by Plasmodium spp. Parasitol Today 1 5: 
466. 

Godel C, et al. 2012. The genome of the heartworm, Dirofilaria immitis, 
reveals drug and vaccine targets. FASEB J. 26:4650^661. 

Hassan HF, Coombs GH. 1988. Purine and pyrimidine metabolism in par- 
asitic protozoa. FEMS Microbiol Rev. 4:47-83. 

Haynes WJ, Ling KY, Saimi Y, Kung C. 2003. Pak paradox: Paramecium 
appears to have more K(+)-channel genes than humans. Eukaryot 
Cell. 2:737-745. 

Horvathova L, et al. 2012. Transcriptomic identification of iron-regulated 
and iron-independent gene copies within the heavily duplicated 
Trichomonas vaginalis genome. Genome Biol Evol. 4:1017-1029. 

Kail L, Krogh A, Sonnhammer EL. 2004. A combined transmembrane 
topology and signal peptide prediction method. J Mol Biol. 338: 
1027-1036. 

Li J, Mahajan A, Tsai MD. 2006. Ankyrin repeat: a unique motif mediating 
protein-protein interactions. Biochemistry 45:15168-15178. 

Liaw A, Wiener M. 2002. Classification and regression by random forest. R 
News 2:18-22. 

Lopez-Soto F, et al. 2009. Entamoeba histolytica uses ferritin as an iron 
source and internalises this protein by means of clathrin-coated vesi- 
cles. Int J Parasitol. 39:417^126. 

Lowry DB, et al. 2012. Mapping of ionomic traits in Mimulus guttatus 
reveals mo and cd qtls that colocalize with mot1 homologues. PLoS 
One 7:e30730. 

MacKinnon R. 1995. Pore loops: an emerging theme in ion channel struc- 
ture. Neuron 14:889-892. 

Magrane M, UniProt Consortium. 201 1 . Uniprot knowledgebase: a hub of 
integrated protein data. Database (Oxford) 201 1:bar009. 

Ortiz-Estrada G, et al. 2012. Iron-saturated lactoferrin and pathogenic 
protozoa: could this protein be an iron source for their parasitic style 
of life? Future Microbiol. 7:149-164. 

Parra G, Bradnam K, Korf I. 2007. CEGMA: a pipeline to accurately anno- 
tate core genes in eukaryotic genomes. Bioinformatics 23:1 061-1 067. 



Punta M, et al. 2011. The pfam protein families database. Nucleic Acids 

Res. 40:D290-D301. 
R Core Team. 2013. R: a language and environment for statistical 

computing. Vienna (Austria): R Foundation for Statistical 

Computing. 

Rus A, et al. 2006. Natural variants of AtHKTI enhance Na + accumulation 
in two wild populations of arabidopsis. PLoS Genet. 2:e210. 

Saier MH Jr, Tran CV, Barabote RD. 2006. TCDB: the transporter classifi- 
cation database for membrane transport protein analyses and infor- 
mation. Nucleic Acids Res. 34:D181-D186. 

Saier MH Jr, Yen MR, Noto K, Tamang DG, Elkan C. 2009. The transporter 
classification database: recent advances. Nucleic Acids Res. 37: 
D274-D278. 

Salt DE, Baxter I, Lahner B. 2008. lonomics and the study of the plant 
ionome. Annu Rev Plant Biol. 59:709-733. 

Shabb JB, Corbin JD. 1992. Cyclic nucleotide-binding domains in proteins 
having diverse functions. J Biol Chem. 267:5723-5726. 

Smith TF, Waterman MS. 1981. Identification of common molecular sub- 
sequences. J Mol Biol. 147:195-197. 

Stoffel SA, et al. 2006. Biosynthesis and uptake of thiamine (vitamin B1) 
in bloodstream form Trypanosoma brucei brucei and interference 
of the vitamin with melarsen oxide activity. Int J Parasitol. 36: 
229-236. 

Sun L, et al. 2012. Associations between ionomic profile and metabolic 
abnormalities in human population. PLoS One 7:e38845. 

Suzuki R, Shimodaira H. 2006. Pvclust: an R package for assessing 
the uncertainty in hierarchical clustering. Bioinformatics 22: 
1540-1542. 

Taylor MC, Kelly JM. 2010. Iron metabolism in trypanosomatids, and its 
crucial role in infection. Parasitology 137:899-917. 

Thompson JD, Higgins DG, Gibson TJ. 1994. Clustal W: improving the 
sensitivity of progressive multiple sequence alignment through se- 
quence weighting, position-specific gap penalties and weight matrix 
choice. Nucleic Acids Res. 22:4673-4680. 

Ward JM, Maser P, Schroeder Jl. 2009. Plant ion channels: gene families, 
physiology, and functional genomics analyses. Annu Rev Physiol. 71: 
59-82. 

Weinberg ED. 2009. Iron availability and infection. Biochim Biophys Acta. 
1790:600-605. 

Wolf Yl, Koonin EV. 2013. Genome reduction as the dominant mode of 
evolution. Bioessays 35:829-837. 

Associate editor: Geoff McFadden 



Genome Biol. Evol. 5(10): 1902-1 909. doi:10.1093/gbe/evt134 Advance Access publication September 18, 2013 



1909 



